Liveability Analysis by Location
Authored by: Adam Bullivant and Basilia Sethu
Duration: 120 mins
Level: Intermediate
Pre-requisite Skills: Python, Data Science, and Data Engineering
Scenario

1. As a potential resident, I want to identify the best places suitable to live for myself and my family.¶

Individuals or families may be looking to move to Melbourne, and it is a high possibility they are unsure which locations would be the best. Families may want to move to areas with good schools, adventure seekers may be looking for locations with future planned activities, or workers may be looking for a location that has the best predicted job growth.

2. As a business owner, I want to predict future economics growth areas to choose suitable business locations.¶

Business owners may be looking to predict suburbs or locations that are expected to grow in the future, however, it is highly likely they don't have full resources to predict which locations will be best. Real estate investors may want to invest in areas that have high demand to increase ROI on their property, or small business owners may want to branch out to a new area that has optimistic future economic growth.

What Will This Use Case Teach You?

At the end of this use case you will:

  • Learn how to import various types of data from different sources
  • Know how to clean and structure data in a desired format for analysis or visualisation
  • Learn how to visualise various types of geolocation data using plotly
  • Create interactive maps that change based on user input
Introduction/Background to Problem

Melbourne was named the most liveable city in Australia, and the 10th in the world according to Global Liveability Index 2022. The city achieved a perfect score for education (100/100) as well as infrastructure (100/100). Furthermore, it scored highly for Culture and Environment (98.6/100) and Stability (95/100) (Study Melbourne)). This information is great for those living or wishing to live in Melbourne, however, the information itself doesn't detail specific locations and data as to why Melbourne is such a great city to live. How can people wanting to move to Melbourne visualise exactly where these liveability metrics are being depicted from?

Visualisations provide a quick and easy way to interpret large amounts of data in simple ways. Various insightful information can be drawn from this method of analysis. As discussed above, there are currently no visualisations that provide individuals or businesses with a way to see liveability characteristics on a map, instead, simple figures are released that do not explain a full story.

The CoM has various datasets that would contribute to Melbourne's overall liveability. Specifically, they can be aligned with the following:

Key Factors of Liveability

  • Access to Housing
  • Education Opportunities
  • Employment Opportunities
  • Access to Health and Support Services

The goal of this analysis is to reveal individual suburbs, blocks and locations around Melbourne that would score high on liveability based on the above factors. You will be able to visualise why Melbourne scores so high on liveability, and dive deeper into specific locations that would contribute to this overall success.

Datasets

*CoM Datasets*

  • City of Melbourne Jobs Forecasts by Small Area 2020-2040
  • Development Activity Monitor
  • Free and cheap support services, with opening hours, public transport and parking options (Helping Out)

*Vic Government Datasets*

  • School Locations 2022
Walkthrough Steps

Table of Contents

  1. Import Libraries
  2. Connect and Test Datasets
  3. Analysis of Datasets
    • 3.1. City of Melbourne Jobs Forecasts by Small Area 2020-2040
    • 3.2. School Locations 2022
    • 3.3. Development Activity Monitor
    • 3.4. Free and Cheap Support Services
  4. Combining all Visualisations and Data
  5. Findings and Opportunities
  6. Thank You!
Analysis
1. Import Libraries

To begin we shall first import the necessary libraries to support our data analysis and visualisation using Melbourne Open data.

In [1]:
# Standard
import os
import json

# Data import
import urllib
from urllib.request import urlopen 
import requests
from sodapy import Socrata

# Data manipulation
import pandas as pd

# Plotting 
import plotly.graph_objs as go
import plotly.express as px
2. Connect and Test Datasets

To connect to the Melbourne Open Data Portal and gather data, we must use the v2 of their API. In this method, we use the unique dataset id (usually the name as seen below after /datasets/) and create a custom URL.

In [2]:
# Job forecast data
jf_url = 'https://data.melbourne.vic.gov.au/api/v2/catalog/datasets/city-of-melbourne-jobs-forecasts-by-small-area-2020-2040/exports/json?limit=-1&offset=0&timezone=UTC'
r = requests.get(jf_url)
response = r.json()
jf_data = pd.DataFrame(response)

# Development data
dev_url = 'https://data.melbourne.vic.gov.au/api/v2/catalog/datasets/development-activity-monitor/exports/json?limit=-1&offset=0&timezone=UTC'
r = requests.get(dev_url)
response = r.json()
dev_data = pd.DataFrame(response)

# Free and cheap support services
ss_url = 'https://data.melbourne.vic.gov.au/api/v2/catalog/datasets/free-and-cheap-support-services-with-opening-hours-public-transport-and-parking-/exports/json?limit=-1&offset=0&timezone=UTC'
r = requests.get(ss_url)
response = r.json()
ss_data = pd.DataFrame(response)

We have to also connect to the School Locations 2022 dataset from Victoria Government. To do so, we will use a standard method of url libraries to connect to the data API (URL Imports). This method opens the url and and gathers the response (data) as a dataframe.

In [3]:
# School Locations 2022
url = 'https://www.education.vic.gov.au/Documents/about/research/datavic/dv331_schoollocations2022.csv' # url
with urllib.request.urlopen(url) as response:
    sl_data = pd.read_csv(response, encoding='cp1252') # dataframe

Next, we will look at one specific dataset to better understand its structure and how we can use it. For this exercise, we will observe the City of Melbourne Jobs Forecasts by Small Area 2020-2040 dataset, specifically, it's first ten rows.

In [4]:
# Print details of data
print(f'The shape of the dataset is: {jf_data.shape}')
print()
print('The first three rows of this dataset are:')

# Print the first 10 rows of data
jf_data.head(10)
The shape of the dataset is: (9114, 5)

The first three rows of this dataset are:
Out[4]:
geography year category industry_space_use value
0 City of Melbourne 2024 Jobs by industry Accommodation 10734
1 City of Melbourne 2027 Jobs by industry Accommodation 11913
2 City of Melbourne 2029 Jobs by industry Accommodation 12489
3 City of Melbourne 2030 Jobs by industry Accommodation 12785
4 City of Melbourne 2031 Jobs by industry Accommodation 13086
5 City of Melbourne 2033 Jobs by industry Accommodation 13313
6 City of Melbourne 2037 Jobs by industry Accommodation 13739
7 City of Melbourne 2040 Jobs by industry Accommodation 14053
8 City of Melbourne 2041 Jobs by industry Accommodation 14162
9 City of Melbourne 2024 Jobs by industry Admin and support services 17343

We can see that there are 9114 records and 5 fields describing each record. Each record can be broken down into the following fields:

  • geography: Location or Geographical Area e.g., West Melbourne
  • year: Year of forecasted jobs
  • category: Industry/space use type (jobs by industry; jobs by space use)
  • industry_space_use: Industry of relevant jobs or jobs by space use e.g., Health Industry
  • value: Number of jobs forecast for the relevant year

Awesome! After taking a look at one of the CoM datasets we can see the overall structure and contents. Lets now begin our analysis of each individual dataset.

3. Analysis of Datasets

We are now going analyse each individual dataset so we can generate useful information and visualisations about them. This will assist us when producing final interactive maps and predictions later on.

3.1. City of Melbourne Jobs Forecasts by Small Area 2020-2040¶

We are going to visualise the job forecasts in each area by 2040, i.e., all jobs from 2020-2040 by each location listed above in 'geography'. However, we first must create a summary of the data.

In [5]:
# Cast datatypes to correct type so we can analyse and summarise
jf_data[['year', 'value']] = jf_data[['year', 'value']].astype(int)
jf_data = jf_data.convert_dtypes()

# Create summary data frame 
# Group data by geography field, and aggregate by sum of forecasted jobs 
jobsByArea = pd.DataFrame(jf_data.groupby('geography', as_index=False).agg({'value': ['sum']}))

# DataFrame groupby creates two lines of headings
# We flatten the headings to make it easier to extract data for plotting
jobsByArea.columns = jobsByArea.columns.map(''.join) # flatten column header
jobsByArea.rename(columns={'geography': 'featurenam', 'valuesum': 'forecasted_jobs'}, inplace=True) #rename to match GeoJSON format

# Remove 'City of Melbourne' row as this is the sum of all areas and isn't required
jobsByArea = jobsByArea.drop(jobsByArea.index[1])
jobsByArea
Out[5]:
featurenam forecasted_jobs
0 Carlton 1855047
2 Docklands 7081294
3 East Melbourne 2090532
4 Kensington 872976
5 Melbourne (CBD) 21902917
6 Melbourne (Remainder) 2304954
7 North Melbourne 1520826
8 Parkville 2918907
9 Port Melbourne 1681915
10 South Yarra 134480
11 Southbank 4417452
12 West Melbourne (Industrial) 422335
13 West Melbourne (Residential) 530041

We can now see the total forecasted jobs for each area up until 2040.

Next, we are going to visualise the forecasted jobs for each area on a choropleth map. Creating a choropleth map requires us to know the geometry (shape) of each area as a collection of latitude and longitude points defining a polygon. This data can be downloaded from the Melbourne Open Data Portal in GeoJSON format. For our data, we are using the Small Areas for Census Land Use and Employment (CLUE) data as this aligns with the forecasted jobs dataframe we have. This dataset contains spatial definition of boundaries about the original 'geography' column of our dataset.

Below we extract the Small Areas for Census Land Use and Employment (CLUE) data.

In [6]:
area_url = 'https://data.melbourne.vic.gov.au/api/v2/catalog/datasets/small-areas-for-census-of-land-use-and-employment-clue/exports/geojson?limit=-1&offset=0&timezone=UTC'
r = requests.get(area_url)
response = r.json()
area = response

# Display the unique keys for each spacial boundary
area['features'][0]['properties'].keys()
Out[6]:
dict_keys(['geo_point_2d', 'featurenam', 'shape_area', 'shape_len'])

We can see that the 'featurenam' field in the CLUE area data contains the specific area of the given spacial boundary. We are going to use this to match with our jobsByArea dataframe (remember we changed the 'geography' column to 'featurenam' so it aligns with the CLUE data).

Simply put, our original dataframe has labelled string areas (Carlton, Docklands, etc.), without spacial definitions (latitude/longitude points). To plot these areas, we require the spacial definitions, which is present in the CLUE dataset. We are essentially matching the labelled string areas between datasets as a key, allowing us to plot our original data as the CLUE data contains the required polygons (spacial definitions). The key as mentioned above is the 'featurenam', which holds the labelled string areas (Carlton, Docklands, etc.).

Now using the chloropleth_mapbox function we can display a map using the CLUE data (GeoJSON) to define the regions and the jobsByArea dataframe to define the summarised data by area.

In [7]:
# Display the choropleth map
fig = px.choropleth_mapbox(jobsByArea, # pass in the summarised jobs by area
                           geojson=area, # pass in the GeoJSON data defining the areas
                           locations='featurenam', # define the unique identifier for the areas from the dataframe
                           color='forecasted_jobs', # change the colour of the area according to the forecasted jobs
                           color_continuous_scale=["red", "orangered", "orange",
                                                   "yellow", "greenyellow", "green"], # define custom colour scale
                           range_color=(0, jobsByArea['forecasted_jobs'].max()), # set the numeric range for the colour scale
                           featureidkey="properties.featurenam", # define the Unique polygon identifier from the GeoJSON data
                           mapbox_style="carto-darkmatter", # set the visual style of the map
                           zoom=11.9, # set the zoom level
                           center = {"lat": -37.813, "lon": 144.945}, # set the map centre coordinates
                           opacity=0.3, # opacity of the choropleth polygons
                           hover_name='featurenam', # the title of the hover pop up box
                           hover_data={'featurenam':True,'forecasted_jobs':True}, # data in popup box
                           labels={'forecasted_jobs':'Forecasted Jobs','featurenam':'Area'}, # labels for pupup box
                           title='New Forecasted Jobs by 2040', # Title for plot
                           width=950, height=800 # dimensions of plot 
                          )
fig.show()

We now have a visualisation of the forecasted jobs by 2040 in each area around Melbourne!

3.2. School Locations 2022¶

School location data is taken from Data Victoria. This is not CoM data, however, it is still useful and will help us in our analysis.

For analysis and visualisation of the school locations, we are going to map all schools within a specific area of the CBD so it is useful for our end visualisation - if we plot all school data, our map will have points located outside of Melbourne which isn't useful.

First, we are going to have a look at the data to see its characteristics.

In [8]:
sl_data.head(5)
Out[8]:
Education_Sector Entity_Type SCHOOL_NO School_Name School_Type School_Status Address_Line_1 Address_Line_2 Address_Town Address_State ... Postal_Address_Line_1 Postal_Address_Line_2 Postal_Town Postal_State Postal_Postcode Full_Phone_No LGA_ID LGA_Name X Y
0 Government 1 1 Alberton Primary School Primary O 21 Thomson Street NaN Alberton VIC ... 21 Thomson Street NaN ALBERTON VIC 3971 03 5183 2412 681 Wellington (S) 146.66660 -38.61771
1 Government 1 3 Allansford and District Primary School Primary O Frank Street NaN Allansford VIC ... Frank Street NaN ALLANSFORD VIC 3277 03 5565 1382 673 Warrnambool (C) 142.59039 -38.38628
2 Government 1 4 Avoca Primary School Primary O 118 Barnett Street NaN Avoca VIC ... P O Box 12 NaN AVOCA VIC 3467 03 5465 3176 599 Pyrenees (S) 143.47565 -37.08450
3 Government 1 8 Avenel Primary School Primary O 40 Anderson Street NaN Avenel VIC ... 40 Anderson Street NaN AVENEL VIC 3664 03 5796 2264 643 Strathbogie (S) 145.23472 -36.90137
4 Government 1 12 Warrandyte Primary School Primary O 5-11 Forbes Street NaN Warrandyte VIC ... 5-11 Forbes Street NaN WARRANDYTE VIC 3113 03 9844 3537 421 Manningham (C) 145.21398 -37.74268

5 rows × 21 columns

As seen above, there are many columns that won't be useful for our analysis. Let's remove these by creating a new dataframe and storing only the important columns. Remember, we are only focused on plotting the locations of the schools, therefore, most of the data inside the dataset isn't useful.

*Note: The address of each school (Address_Line_1) may appear to be useful as we want to plot the locations, however, when using plotly and python to plot geodata, the latitude and longitude is best to be used.

In [9]:
# Keep only the required columns for plotting
sl_data = sl_data[['School_Name', 'School_Type', 'X', 'Y']]

# Remove null data
sl_data = sl_data.dropna()

# Reset the index of data frame
sl_data = sl_data.reset_index(drop=True)

sl_data.head(5)
Out[9]:
School_Name School_Type X Y
0 Alberton Primary School Primary 146.66660 -38.61771
1 Allansford and District Primary School Primary 142.59039 -38.38628
2 Avoca Primary School Primary 143.47565 -37.08450
3 Avenel Primary School Primary 145.23472 -36.90137
4 Warrandyte Primary School Primary 145.21398 -37.74268

Great! Now we have our required data to plot each school and it's type around Melbourne. We are going to use plotly express to create a scatter mapbox. This will help us plot the school location using the latitude (X) and longitude (Y) as seen in the above dataframe.

However, before we do this, we need to remove some schools from our data. Currently, the data contains all schools in Victoria, meaning that when we create our map, it will show all of Victoria. We are only interested in Melbourne. Specifically, we want a perimeter of schools around the CoM CBD. To do this, we are going to create a radius from the Melbourne CBD (this is according to the size of our map). The relevant information can therefore be drawn that:

  • Longitude (X) must be between 144.88824 and 145.00226
  • Latitude (Y) must be between -37.77682 and -37.85019

We are going to remove all schools that don't meet these conditions:

In [10]:
# Remove if longitude is greater than or less than
sl_data.drop(sl_data[sl_data['X'] < 144.88824].index, inplace = True)
sl_data.drop(sl_data[sl_data['X'] > 145.00226].index, inplace = True)

# Remove if latitude is greater than or less than
sl_data.drop(sl_data[sl_data['Y'] > -37.77682].index, inplace = True)
sl_data.drop(sl_data[sl_data['Y'] < -37.85019].index, inplace = True)

sl_data
Out[10]:
School_Name School_Type X Y
29 Flemington Primary School Primary 144.93392 -37.78067
30 Footscray Primary School Primary 144.89267 -37.79838
51 Fitzroy Primary School Primary 144.98151 -37.79960
70 South Yarra Primary School Primary 144.98562 -37.84135
183 Albert Park Primary School Primary 144.95277 -37.84188
... ... ... ... ...
2233 St Joseph's Flexible Learning Centre Melbourne Special 144.95494 -37.80384
2251 Melbourne Indigenous Transition School Special 144.98926 -37.81954
2262 River Nile School Special 144.95509 -37.80519
2264 Hester Hornbrook Academy Special 144.95615 -37.81654
2290 Ignatius Learning Centre Secondary 144.99848 -37.82191

77 rows × 4 columns

Great! Now we can see that there are only 77 schools left, which sounds about correct. Lets now plot it to see:

In [11]:
fig2 = px.scatter_mapbox(sl_data, lat='Y', lon='X', # plot on latitude and longitude
                        mapbox_style="carto-darkmatter", # style of map
                        zoom=12.15, # set initial zoom
                        center = {"lat": -37.813, "lon": 144.945}, # centre of the map
                        opacity=0.7, # opacity of each marker/dot
                        hover_name="School_Name", # only display the school name when hovered
                        hover_data={"School_Name":False,"School_Type":False,"X":False,"Y":False},
                        color = 'School_Type', # each school type has a different colour
                        color_discrete_sequence=['mediumorchid', 'blue', 'red', 'aqua', 
                                                 'limegreen', 'orange'], # school colours
                        labels={'School_Name':'School Name', 'School_Type':'School Type'}, # change labels
                        title = 'Schools Locations 2022', # title of map
                        width=950, height=800) # size of map

fig2.update_traces(marker={'size': 10}) # change the marker size

fig2.show()

We can now see the school locations around Melbourne. Each school type corresponds to a colour, and we can see the school name when we hover our mouse over each point!

3.3. Development Activity Monitor¶

Next, we are going to look at the development activity monitor dataset. This dataset tracks new commercial and residential property development in the City of Melbourne. Due to access to housing being a key factor in liveability, we are going to specifically look at the residential property side of this dataset.

Let's print the first five rows of the dataset to see what it looks like:

In [12]:
dev_data.head(5)
Out[12]:
data_format development_key status year_completed clue_small_area clue_block street_address property_id property_id_2 property_id_3 ... hospital_flr recreation_flr publicdispaly_flr community_flr car_spaces bike_spaces town_planning_application longitude latitude geopoint
0 Pre May 16 X000568 COMPLETED 2012 West Melbourne (Residential) 411 1-13 Abbotsford Street WEST MELBOURNE VIC 3003 100001 None None ... 0 0 0 0 0 0 0 144.943280 -37.807920 {'lon': 144.9432805, 'lat': -37.80791988}
1 Pre May 16 X000557 COMPLETED 2002 West Melbourne (Residential) 401 7-21 Anderson Street WEST MELBOURNE VIC 3003 100435 None None ... 0 0 0 0 0 0 0 144.941547 -37.804777 {'lon': 144.9415469, 'lat': -37.80477682}
2 Pre May 16 X000448 COMPLETED 2015 North Melbourne 314 302-308 Arden Street NORTH MELBOURNE VIC 3051 100509 None None ... 0 0 0 0 24 6 0 144.937724 -37.799250 {'lon': 144.9377236, 'lat': -37.79925034}
3 Pre May 16 X000458 COMPLETED 2004 North Melbourne 330 162-168 Arden Street NORTH MELBOURNE VIC 3051 100519 None None ... 0 0 0 0 0 0 0 144.946228 -37.800320 {'lon': 144.9462277, 'lat': -37.80032041}
4 Pre May 16 X000996 COMPLETED 2013 North Melbourne 1012 201 Arden Street NORTH MELBOURNE VIC 3051 100552 None None ... 0 0 0 0 0 0 0 144.941047 -37.800299 {'lon': 144.9410467, 'lat': -37.80029861}

5 rows × 42 columns

This dataset has many columns (42) as seen above. We only want to focus on the residential properties, therefore, we should remove all property information except the 'resi_dwellings'. Keep in mind we still want to keep the CLUE Small Area and CLUE Block. This is because we are going to visualise the amount of residential dwellings by each block (similar to what we did in the Job Forecast analysis).

In [13]:
# Keep only the required columns to perform visualisation
dev_data = dev_data[['clue_small_area', 'clue_block', 'resi_dwellings']]

dev_data.head(5)
Out[13]:
clue_small_area clue_block resi_dwellings
0 West Melbourne (Residential) 411 10
1 West Melbourne (Residential) 401 31
2 North Melbourne 314 0
3 North Melbourne 330 16
4 North Melbourne 1012 0

Awesome! The dataset now only contains the information we need to create a choropleth map. We have the CLUE Small Area, the CLUE Block ID, as well as the amount of residential dwellings corresponding to the given block/area.

However, before we can create the map, we need to perform grouping and aggregation so we can gather dwelling information by each block area:

In [14]:
# Cast datatypes
dev_data['resi_dwellings'] = dev_data['resi_dwellings'].astype(int)
dev_data = dev_data.convert_dtypes()

# Group by fields
groupby = ['clue_block', 'clue_small_area']
           
# Aggregate by fields
aggregateby = {'resi_dwellings': ['sum']}

# Perform grouping and aggregation
dwellingsByBlock = pd.DataFrame(dev_data.groupby(groupby, as_index=False).agg(aggregateby))

dwellingsByBlock.columns = dwellingsByBlock.columns.map(''.join) # flatten column header
dwellingsByBlock.rename(columns={'clue_small_area': 'clue_area'}, inplace=True) # rename to match GeoJSON extract
dwellingsByBlock.rename(columns={'resi_dwellingssum': 'dwelling_count'}, inplace=True)
dwellingsByBlock.head(5)
Out[14]:
clue_block clue_area dwelling_count
0 1 Melbourne (CBD) 385
1 2 Melbourne (CBD) 0
2 6 Melbourne (CBD) 0
3 11 Melbourne (CBD) 706
4 12 Melbourne (CBD) 33

Similar to the job forecast, we need the CLUE area information so we can generate the choropleth map. We want to plot the dwellings by each block (CLUE Block) this time though, not by the CLUE Area.

We therefore need to import the geometry (shape) of each CLUE Block as collections of latitude and logitude points. This data can be gathered from the Melbourne Open Data Portal in GeoJSON format. Below we will extract the required CLUE Block data.

In [15]:
block_url = 'https://data.melbourne.vic.gov.au/api/v2/catalog/datasets/blocks-for-census-of-land-use-and-employment-clue/exports/geojson?limit=-1&offset=0&timezone=UTC'
r = requests.get(block_url)
response = r.json()
block = response

We can see that the 'block_id' field in the CLUE Block data contains the specific block of the given spacial boundary. We are going to use this to match with our dwellingsByBlock dataframe (the clue_block field).

Note: The 'block_id' field in the CLUE Block Data is the equivalent of the 'clue_block' field in the dwellingsByBlock data. We can map the dwelling count by each block thanks to these two fields matching between the datasets and acting as a key. We are basically plotting the dwellingsByBlock data by pairing each block (clue_block) with the CLUE block_id, where the CLUE Block dataset contains the actual geodata we need to map the blocks with their dwellings.

Now using the chloropleth_mapbox function we can display a map using the Block GeoJSON data to define the regions and the dwellingsByBlock dataframe to define the summarised data by block.

In [16]:
# Display the choropleth map
fig4 = px.choropleth_mapbox(dwellingsByBlock, # pass in the dwellings data
                           geojson=block, # pass in the block data
                           locations='clue_block', # locations of dwellings are clue_block (block_id)
                           color='dwelling_count', # colour correspnding to dwelling count of each block
                           color_continuous_scale=["red", "orangered", "orange",
                                                   "yellow", "greenyellow", "green"], # colour scale
                           range_color=(0, dwellingsByBlock['dwelling_count'].max()), # range of colour scale
                           featureidkey="properties.block_id", # match the block_id to clue_block
                           mapbox_style="carto-darkmatter", # style of map
                           zoom=12.15, # initial zoom
                           center = {"lat": -37.813, "lon": 144.945}, # centre of map
                           opacity=0.22, # opacity of highlighted blocks
                           hover_name='clue_area', # area displayed when hovered over
                           hover_data={'clue_block':True,'dwelling_count':True}, # data displayed when hovered over
                                                                               
                           labels={'dwelling_count':'Residential Dwellings', 'clue_block':'CLUE Block ID'}, # label changes
                                                                               
                           title='Completed and Under-Construction Residential Dwellings 2022', # title of map
                           width=950, height=800 # size of map
                          )

fig4.show()

You've now successfully plotted the amount of residential dwellings by block in the City of Melbourne!

3.4. Free and Cheap Support Services¶

One key component of liveability is the amount of support services such as hospitals that are near a given location. Therefore, it is important to include this in our visualisation. This dataset may require updates, but given free and cheap support services rarely shifting, this data is satisfactory.

Let's take a look at the data:

In [17]:
ss_data.head(5)
Out[17]:
name what who address_1 address_2 suburb phone phone_2 free_call email ... nearest_train_station category_1 category_2 category_3 category_4 category_5 category_6 longitude latitude geocoded_location
0 Child Protection Emergency Service None None None None None 13 12 78 None None None ... None Helpful phone number N/A N/A N/A N/A N/A NaN NaN None
1 Gamblers Help Line Victoria None None None None None None None 1800 858 858 None ... None Helpful phone number N/A N/A N/A N/A N/A NaN NaN None
2 Kids Help line None None None None None None None 1800 551 800 None ... None Helpful phone number N/A N/A N/A N/A N/A NaN NaN None
3 Lifeline (24 hour crisis counselling) None None None None None 13 11 14 None None None ... None Helpful phone number N/A N/A N/A N/A N/A NaN NaN None
4 Narcotics Anonymous - Victorian Area Helpline None None None None None 9525 2833 None None info@navic.net.au ... None Helpful phone number Helpful website N/A N/A N/A N/A NaN NaN None

5 rows × 34 columns

Awesome! We can now see the structure of the data. We are focused on creating a visual of these locations, therefore, we want to only keep the name, service, and latitude/longitude of the data:

In [18]:
# Keep specific data
ss_data = ss_data[['name', 'what', 'latitude', 'longitude']]

# Print
ss_data.head(5)
Out[18]:
name what latitude longitude
0 Child Protection Emergency Service None NaN NaN
1 Gamblers Help Line Victoria None NaN NaN
2 Kids Help line None NaN NaN
3 Lifeline (24 hour crisis counselling) None NaN NaN
4 Narcotics Anonymous - Victorian Area Helpline None NaN NaN

The data now contains only useful information for plotting each of the locations.

However, we can see from above that some of our data doesn't contain latitude and longitude. This is because these services are online/phone based, for example, Child Protection Emergency Services. We are only focused on brick and mortar style health services. Let's remove any data that doesn't contain latitude and longitude:

In [19]:
# Remove null data
ss_data = ss_data.dropna()

# Reset the index of data frame
ss_data = ss_data.reset_index(drop=True)

ss_data.head(5)
Out[19]:
name what latitude longitude
0 Aboriginal Family Violence Prevention and Lega... Legal Services, Counselling Support, Informati... -37.806427 144.986299
1 Alcoholics Anonymous Victoria AA is a fellowship of men and women who share ... -37.811648 145.000307
2 Royal Melbourne Hospital Outpatients’ emergency service -37.798877 144.956177
3 Anglicare Victoria – St.Mark’s Community Centre St Mark’s provides assistance to homeless peop... -37.801611 144.981835
4 Brotherhood of St Laurence Coolibah Centre Breakfast $1.00 \nlunch $3, afternoon tea $0.2... -37.805286 144.977265

Awesome! We now have the data in the required format we want so we can plot the locations of the services. Let's now again remove the data that isn't focused within a radius of the CBD. We are going to apply the same radius as the school locations:

In [20]:
# Get data into required type
ss_data['longitude'] = ss_data['longitude'].astype(float)
ss_data['latitude'] = ss_data['latitude'].astype(float)

# Remove if longitude is greater than or less than
ss_data.drop(ss_data[ss_data['longitude'] < 144.88824].index, inplace = True)
ss_data.drop(ss_data[ss_data['longitude'] > 145.00226].index, inplace = True)

# Remove if latitude is greater than or less than
ss_data.drop(ss_data[ss_data['latitude'] > -37.77682].index, inplace = True)
ss_data.drop(ss_data[ss_data['latitude'] < -37.85019].index, inplace = True)

We can now plot the data into a visualisation using a scatter mapbox:

In [21]:
fig5 = px.scatter_mapbox(ss_data, lat='latitude', lon='longitude', # plot on latitude and longitude
                        mapbox_style="carto-darkmatter", # style of map
                        zoom=12.15, # set initial zoom
                        center = {"lat": -37.813, "lon": 144.945}, # centre of the map
                        opacity=0.7, # opacity of each marker/dot
                        hover_name="name", # only display the school name when hovered
                        hover_data={"name":False,"what":True,"latitude":False,"longitude":False}, # data displayed
                        labels={'what':'Service'}, # change labels
                        title = 'Free and Cheap Services Locations 2020', # title of map
                        width=950, height=800) # size of map


fig5.update_traces(marker={'size': 10}) # change the marker size

fig5.show()

Awesome! We now have the locations plotted all around Melbourne of free and cheap support services!

3.4. Combining all Visualisations and Data

We have now plotted and analysed all of our required datasets, so we are going to combine them all into one visualisation that we can interact with and see the different areas of liveability.

To do so, we are going to create a base layer and then plot each data on top of this base layer. First, let's create the base layer and set the default styles and concepts of the map:

In [22]:
# Create the base figure to which layers(traces) will be added.
output_fig = go.Figure()

# Set the default style for the map
output_fig.update_layout(mapbox_style="carto-darkmatter")
output_fig.update_layout(hovermode='closest')
output_fig.update_layout(mapbox_center_lat=-37.813, mapbox_center_lon=144.945, mapbox_zoom=12.15)
output_fig.update_layout(width=975, height=800)
output_fig.update_layout(title='Liveability Visualisation Analysis')
output_fig.update_layout(coloraxis_colorscale='viridis')

# Stores inside new output_fig to prevent printing to screen
output_fig = output_fig.update_layout(coloraxis_colorbar={'title':'Jobs/Dwellings'})

Now that the base map has been set, we can add each of the dataset visualisations to the base layer. The first layers we will add are the single locational plots, i.e., the School Locations, and the Health/Services Locations. To add these layers, we are first going to create new maps for them with different characteristics to above so we can combine them in a single source visualisation.

However, we are first going to add a new column to the locational data with constant values of Schools and Services. We are doing this so we can create a legend on our final visualisation that shows which marker point is what.

In [23]:
sl_data['Plot_Type'] = 'Schools'
ss_data['Plot_Type'] = 'Health/Services'

sl_data.head(5)
Out[23]:
School_Name School_Type X Y Plot_Type
29 Flemington Primary School Primary 144.93392 -37.78067 Schools
30 Footscray Primary School Primary 144.89267 -37.79838 Schools
51 Fitzroy Primary School Primary 144.98151 -37.79960 Schools
70 South Yarra Primary School Primary 144.98562 -37.84135 Schools
183 Albert Park Primary School Primary 144.95277 -37.84188 Schools

We can now see that there is a new column called Plot_Type that will depict if the marker is a school or a service. Let's now create the traces for each marker and add it to our base plot.

The code below also plots both sets of data twice as seen in visfig1 and visfig2. We are simply creating a smaller marker plot with more opacity so it gives the final marker a better visual appeal. See below output markers compared to above visualisations to compare.

In [24]:
# School location data 
fig1 = px.scatter_mapbox(sl_data, lat="Y", lon="X",
                        hover_name="School_Name",
                        hover_data={"School_Name":False,"School_Type":False,"X":False,"Y":False},
                        labels={'School_Name':'School Name', 'School_Type':'School Type'},
                        opacity=0.5, 
                        color_discrete_sequence=['blue'],
                        color='Plot_Type')
# Change marker size for school location points
fig1.update_traces(marker={'size':12})

# Service location data
fig2 = px.scatter_mapbox(ss_data, lat='latitude', lon='longitude', 
                        hover_name="name", 
                        hover_data={"name":False,"what":False,"latitude":False,"longitude":False},
                        opacity=0.5, 
                        color_discrete_sequence=['red'],
                        color='Plot_Type')
# Change marker size for service location points
fig2.update_traces(marker={'size':12})

# Add same location points but with smaller marker
# This gives nicer marker visualisation look
# Only for visual appeal

####
visfig1 = px.scatter_mapbox(ss_data, lat='latitude', lon='longitude', 
                        hover_name="name", 
                        hover_data={"name":False,"what":False,"latitude":False,"longitude":False},
                        opacity=1, 
                        color_discrete_sequence=['red'],)

visfig1.update_traces(marker={'size':5})

visfig2 = px.scatter_mapbox(sl_data, lat="Y", lon="X",
                        hover_name="School_Name",
                        hover_data={"School_Name":False,"School_Type":False,"X":False,"Y":False},
                        labels={'School_Name':'School Name', 'School_Type':'School Type'},
                        opacity=1, 
                        color_discrete_sequence=['blue'],)

visfig2.update_traces(marker={'size':5})
####

# Add both locational maps/traces to the base layer
output_fig.add_trace(fig1.data[0])
output_fig.add_trace(fig2.data[0])
output_fig.add_trace(visfig1.data[0])
output_fig.add_trace(visfig2.data[0])

Great! We now have a visually appealing map of the school locations and health/services around Melbourne. We want to also plot the other liveability characteristics including residential dwellings as well as job forecasts. To do so, we will use a similar approach to the above method, but change the scatter mapbox to choropleth mapbox, and then add the trace to our current base map.

In [25]:
# Create the forecasted jobs plot
fig3 = px.choropleth_mapbox(jobsByArea, geojson=area, locations='featurenam', color='forecasted_jobs',
                           range_color=(0, jobsByArea['forecasted_jobs'].max()),
                           featureidkey="properties.featurenam",
                           hover_name='featurenam',
                           hover_data={'featurenam':True,'forecasted_jobs':True},
                           labels={'forecasted_jobs':'Forecasted Jobs','featurenam':'Area'},
                           opacity=0.3,
                          )

# Add forecasted jobs layer to the base figure
output_fig.add_trace(fig3.data[0]) 

# Create the residential dwellings plot
fig4 = px.choropleth_mapbox(dwellingsByBlock, geojson=block, locations='clue_block', color='dwelling_count',
                           range_color=(0, dwellingsByBlock['dwelling_count'].max()),
                           featureidkey="properties.block_id",
                           hover_name='clue_area',
                           hover_data={'clue_block':True,'dwelling_count':True},
                           labels={'dwelling_count':'Residential Dwellings', 'clue_block':'CLUE Block ID'},
                           opacity=0.3
                          )

# Add residential dwellings layer to the base figure
# Store final plot in variable to prevent printing to screen
output_fig = output_fig.add_trace(fig4.data[0])

Let's also add a tool that allows us to input our own address and get it plotted on the map. To do this, we are going to use Nominatim, which is a geocoding software that powers open street map. Through this tool we are able to input an address, in which a latitude and longitude corresponding to that address is returned.

We will input our own address, and then pass it into a 'url' variable. Once done, we will use the requests package to get the JSON response from Nominatim, specifically, the latitude and longitude for the given address:

In [26]:
# Input your own address below
# Make sure it is copied exactly as Nominatim has it
address = 'Macarthur Road, Parkville, Melbourne, City of Melbourne, Victoria, 3052, Australia'

# Store in url
url = 'https://nominatim.openstreetmap.org/search/' + urllib.parse.quote(address) +'?format=json'

# Get response
response = requests.get(url).json()

# X Address (Longitude) and Y Address (Latitude)
X_adr = response[0]["lon"]
Y_adr = response[0]["lat"]

# Print to screen
print(f"The latitude of your custom address is: {Y_adr}")
print(f"The longitude of your custom address is: {X_adr}")
The latitude of your custom address is: -37.7892998
The longitude of your custom address is: 144.9570502

Great! Now we have the latitude and longitude of our address. Feel free to add your own address, but make sure it is inputted exactly as the Nominatim resource states it as.

Let's turn our lat/long pair into a data frame so we can add it to our plot.

In [27]:
# Create the data with corresponding labels
data = {'address': [address], # string address
        'X': [X_adr], # longitude
        'Y': [Y_adr], # latitude
        'Plot_Type':'Custom Location'} # plot_type for legend on final visualisation

# Create dataframe of data
adr_df = pd.DataFrame(data)

# Show dataframe
adr_df
Out[27]:
address X Y Plot_Type
0 Macarthur Road, Parkville, Melbourne, City of ... 144.9570502 -37.7892998 Custom Location

Awesome! The personal address is now inside a dataframe with X and Y coordinates, as well as the actual address and plot type (for visualisation). Our next step is to add this to our visualisation above. To do so, we are going to use the previous approach of a scatter mapbox.

Note: This custom address tool will be useful as we can see the exact location and analyse surrounding schools, dwellings, and suburbs with respect to population!

In [28]:
# We are going to make multiple traces to make the point more visually appealling like previously

# First marker point
adr_fig = px.scatter_mapbox(adr_df, lat='Y', lon='X', 
                        hover_name="address",
                        hover_data={"address":False,"X":False,"Y":False, "Plot_Type":False}, 
                        opacity=0.4, 
                        color_discrete_sequence=['green'],)
adr_fig.update_traces(marker={'size': 30})

# Second marker point
adr_fig2 = px.scatter_mapbox(adr_df, lat='Y', lon='X', 
                        hover_name="address",
                        hover_data={"address":False,"X":False,"Y":False, "Plot_Type":False}, 
                        opacity=0.7, 
                        color_discrete_sequence=['limegreen'],)
adr_fig2.update_traces(marker={'size': 20})

# Third marker point
adr_fig3 = px.scatter_mapbox(adr_df, lat='Y', lon='X', 
                        hover_name="address",
                        hover_data={"address":False,"X":False,"Y":False, "Plot_Type":False}, 
                        opacity=0.9, 
                        color_discrete_sequence=['lime'],
                        color='Plot_Type')
adr_fig3.update_traces(marker={'size': 12})

# Add trace combination of points into the final visualisation
output_fig.add_trace(adr_fig.data[0])
output_fig.add_trace(adr_fig2.data[0])
output_fig = output_fig.add_trace(adr_fig3.data[0])

Now we have the final visualisation stored inside output_fig, but we need to add some usability to it such as a drop down menu to change the data displayed on the map.

To do this, we first must turn off all layers of the current map as we want a base start for the drop down menu. We then define the buttons for this drop down menu.

Each button will turn on requested maps/layers when clicked. This is decided through 'visible' booleans in args which correspond to True or False of displaying the desired map.

For example: If School Locations and Health/Service Locations are clicked, we require the first 4 figures above to be displayed (remember 2 were for visualisation purposes but we still want them displayed). This corresponds to a relevant visiable boolean sequence of [True, True, True, True, False, False, True].

Note: The final True boolean corresponds to our custom location

In [29]:
# Turn off all choropleth layers
output_fig.update_traces(visible=False, selector=dict(type='choroplethmapbox'))

# Add buttons for selection on the visualisations
buttons = [dict(method='update',
                label='School Locations || Health/Services Locations',  visible=True,
                args=[{'label': 'Venue Seating', 'visible':[True, True, True, True, False, False, True]}]),
           dict(method='update',
                label='Forecasted Jobs || Schools || Health/Services', visible=True,
                args=[{'label': 'Residential Dwelling Density','visible':[True, True, True, True, True, False, True]}]),
           dict(method='update',
                label='Residential Dwellings || Schools || Health/Services', visible=True,
                args=[{'label': 'Employment Density','visible':[True, True, True, True, False, True, True]}])
          ]
                   
um_buttons = [{'active':0, 'showactive':True, 'buttons':buttons,
               'direction': 'down', 'xanchor': 'left','yanchor': 'bottom', 'x': 0.53, 'y': 1.06}]
map_annotations = [{'text':'Please select a map view to display', 'x': 1, 'y': 1.15,
                    'showarrow': False, 'font':{'family':'Arial','size':14}}]

# Add features to the visualisation
output_fig.update_layout(updatemenus=um_buttons, annotations=map_annotations)

# Change position of legen
output_fig.update_layout(legend=dict(x=1,y=1.15))

# Display the visualisation
output_fig.show()
5. Findings and Opportunities

Our analysis and visualisations have provided a deeper understanding into the liveability components of Melbourne. Rather than being able to search up the best areas, we can now see visually each suburb and its liveability factors.

We achieved in this analysis:

  • An interactive visualisation that allows users to physically view surrounding liveability components with a given input location

We learned from this analysis:

  • Most schools and services are clustered around the Melbourne CBD
  • Melbourne CBD has the highest forecasted amount of jobs by 2040
  • Docklands has the largest amount of complete and planned building of residential dwellings

Further opportunities:

  • Incorporating more datasets such as crime rate, or access to public transport
  • Algorithm/tool that computes the best suburb based on all input liveability factors

Conclusion: Given the best and most access to residential dwellings (housing), as well as close proximity to the Melbourne CBD where the most jobs are forecasted, Docklands proves to be the best location of Melbourne. Furthermore, there are various schools, both primary and high situated in close proximity to Docklands, therefore suggesting it would be a high scoring suburb to live in.

6. Thank You!

Awesome work. The interactive map is now complete, and we can input custom locations to see relevant liveability metrics including schools, health services, forecasted jobs, and residential property that are around Melbourne. Thank you for your time!